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In many diseases classification an accurate gene analysis is needed, for 
which selection of most informative genes is very important and it require a 
technique of decision in complex context of ambiguity. The traditional 
methods include for selecting most significant gene includes some of the 
statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise 


Ratio (SNR). This paper evaluates gene selection and classification on the 


basis of accurate gene selection using structured complex decision technique 
Keyword: (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier 
Fuzzy classification (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is 

g evaluated for leave one out cross validation metric(LOOCV) along with 
Gene analysis sensitivity, specificity, precision and Fl-score with four different classifiers 
Gene selection namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) 
Machine learning Feed Forward(FF) and 4) Support vector machine(SVM) for three different 
Micro array data datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT 
&FC-NNC exhibits superior result for being considered more accurate 
decision mechanism. 
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1. INTRODUCTION 

The accuracy of diagnosis is the basis for the perfect treatment process to be adopted especially in 
the case of fatal disease like cancers, leukemia and prostrate tumor etc. Along with the histopathology, 
medical radiology and imaging techniques, the micro-array data analysis could be proven quite helpful as 
well as rightful if efficient techniques of analysis are evolved [1]. The accuracy of disease classification or 
early diagnosis depends upon, how accurately the gene of significance is selected. 

The DNA-microarray data analysis is challenging in both aspects of statistically and 
computationally as it possesses non-linear noises along with high dimensionality of low sample data [2]. 
Many efforts towards disease diagnosis particularly cancer, tumor etc, classification have been seen in 
literature [3]-[10]. The section 2 describes the insights of related work. Various machine learning approaches 
are used for the classification which includes radial basis function (RBF), artificial neural network (ANN), 
support vector machine (SVM) etc. by forming the problem as binary classification. The problem of 
dimension reduction for searching most significant gene is being formulated as many problem spaces which 
includes 1) Mixed integer programming (MIP), 2) Bio-inspired optimization (BIO), 3) Mining association 
rules (MAR), and last but not the least 4) Ensemble technique (ET) [8]. 

The clinically comprehensive method requires handling high dimensional data with veracity and 
noises to handle ambiguity during the right gene candidate selection. This paper proposes a mechanism of 
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structured complex decision technique (SCDT) for fuzzy clustering neighborhood cluster (FC-NC). Section 3 
describes complete system model for SCDT & FC-NC, Section 4 describes about three different microarray 
datasets. Section 5 illustrates results and analysis followed by conclusion in Section 6. 


1.1. Background 

The accurate clustering of the data is a challenging and open research problem for classifications 
specially using supervise learning. An extensive survey is conducted to understand the effectiveness of 
clustering techniques particularly for medical data like micro array gene dataset [11]. For the purpose of 
tumor diagnosis, the approach of profiling the gene accuracy is comparatively of higher reliability with more 
accuracy than that of the method adopted by the medical imaging technique of morphological analysis of 
tumor. Traditionally adopted supervised learning approaches falls into pitfall of accuracy due to fewer 
samples of cancer types exist into the training dataset of gene expression as well the overheads due to higher 
data-dimensionality due to large gene expression. 

In the work of Lipowang et al aims to select few numbers of genes to classify the cancer from the 
microarray data to meet the goal of balancing trade-off among the accuracy as well as minimization of the 
computational complexity or overheads[3]. They have used “feature importance ranking scheme” for the 
accurate or significant gene selection and formulated the classification problem as typical cluster of binary 
classification problem. The machine learning approaches used in their work are mix use of fuzzy neural 
network (FNN) and SVM. The dimension reductions obtained were getting same accuracy only by selecting 
28 genes as compared to 16,063 genes of traditional method of that time. The typical dataset explored for the 
observations includes 1) Lymphoma Data, 2) SRBCT Data, 3) Liver Cancer Data, and 4) GCM data. They 
recommended considering the cooperation aspects between the genes to minimize the gene subset for more 
accurate prediction. 

Further, the work which has refer to this includes the work by Chien-Pang et al who have introduced 
a method hybridized using genetic algorithm and dynamically setting up the parameter for significant gene 
selection and then further uses SVM for verification purposes to predict gene selectionefficiency [12]. The 
dimension reduction and feature selection is the core problem to be handled as gene expression microarray 
(GEMA) consist of hundred to sometime thousands of the features in a very small sample size. These high 
numbers of features in a small sample of GEMA makes it of very high dimension data. The conventional 
methods adopted for feature selection which is also called as gene selection in case of the GEMA analysis for 
the classicization of the dieses includes 1) Gain & Relief, 2) Chi Squares, 3) Fisher Score, and 4) Lasso etc. 

The geneselection method is classified into three categories 1) supervised, 2) unsupervised and 3) 
semi-supervised on the basis of corresponding data types of 1) fully labeled, 2) unlabeled and 3) partially 
labeled respectively for classification or prediction of classes as described by [13]. Further, the feature 
available into the GEMA samples are categorized into two critical selections namely redundancy and 
relevancy. The Figure 1, shows the typical classification based on the combination of these two critical 
information’s. 


Feature Classification 


Irrelevancy Less Relevancy Less Relevancy 
& & & Relevance 
Noisy Redundant Redundant 


Figure 1. Gene featureselection or feature classicization basis 


Recently the focus of research is very active as when a keyword of ‘gene selection’ given into IEEE 
Xplore a digital library then approximately 51 journals was found only from 2016 till 3"! February 2018. 
Tang et al in their method of feature selection from GEMA have introduced an improvised mutual 
information correlation (MIC) to handle the distortion due to noise in gene and challenges of multivariate 
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distribution estimation by adopting relevance boosting and enhancement of the feature enhancement [14]. 
Table 1 list the trends of the methods used for the gene selection. 


Table 1. Trend of the Method Adoption for Gene Selection 


Sl. No and references Gene Selection / Classification Method Dieses Classification & Dataset used 


e minimum redundancy feature selection method 
(mRMR) 
e Multiple Kernel Machine (MKL) learning method 


Glioblastomamultiforme 


[15] Zhang et al. (2016) Cancer Genome Atlas(TCGA) database 


— 
o0 


Azzawi et al. 


Huerta et al. (2016) 


Saha et al. (2016) 


Montiel (2016) 


Two gene selection methods 
Gene expression programming (GEP)-based 


Genetic Algorithm 
Tabu Search 
Support Vector Machine 


Fuzzy C-means 


Simulated annealing 
Support vector machine 


Lung cancer 


(2016) adel ° Real microarray lung cancer datasets 
e Epigenetic Biomarker discovery 
[17] Mallik et al. (2017) œ maximal-relevance and minimal-redundancy e Multi-Omics Prostate Carcinoma (PC) 


dataset 


Tumor classification 
Diffuse Large B-cell Lymphoma 


Hypothetical condition of Yeast 

Yeast Sporulation, Yeast Cell Cycle, 
Arabidopsis, Human Fibroblast Scrum, Rat 
CNS 

Leukemia database 

Colon Cancer database 

diffuse large B-cell lymphoma, leukemia 


[21] Nguyen (2016) e Type-2 Fuzzy logic cancer, and prostate 

3 ; ; ; e Tumor classification 
[22] Jin and Win (2016) e Swarm intelligence ë Gene Microarray dataset 
[23]Ray et al. (2016) e Self-Organizing Map e Gene Microarray dataset 
[24] Wang et al. (2016) e — Matrix factorization e Gene Microarray dataset 
[25] Han et al. (2017) e Particle Swarm Optimization ° SRBCT Data 
pace and Wang ° K-means algorithm š ALL, GCM, LYM, NC160, MLL, HBC 
[27] Feng et al. (2017) e Principle Component Analysis e PDDA-GE Dataset 
[28] Omar et al. (2018) e Feature selection principle e Gene expression dataset 
a Ta et al. e segmentation of microarray images e Gene Microarray dataset 
[30] Hore et al. (2016) e Image segmentation e Alpert dataset 


There are various studies being carried out in existing system towards analyzing microarray data 
using different forms of clustering approach. Existing mechanism of clustering are immensely iterative in its 
approach which evidently calls for computational complexity. Such complexity issues have never being 
addressed by any researchers till day. One of the effective mechanisms to resist such complexity problem is 
to design and develop a novel technique with very limited set of iteration unlike conventional machine 
learning approaches. As microarray data consists of higher number of information, there is a need of a system 
that can read all the explicit features of the database in order to perform an effective classification. Adoption 
of fuzzy-based inference system is one such approach where accuracy in classification and complexity can be 
balanced. But existing approaches towards fuzzy logic also doesn’t seem to offer much convincing outcomes 
towards classification posing as one impediment towards existing research works. The next section outlines 
the system model of proposed solution. 


2. SYSTEM MODEL: SCDT & FC-NNC 

The proposed system models SCDT & FC-NNC consist of DS: € {DLBCL(DS:), Leukemia(DSz), 
Prostate Tumor (DS3)}, where i=1,2,3. The individual dataset characteristics are shown in the Table 1a, 1b, 
and 1c of each DS;,DS2 and DS3. The snapshot visualization of each dataset is shown in Table 2 


Table 1(a). Description of DLBCL(DS1) Dataset 
Dataset name Total Gene Total Sample DLBCL FL 
DLBCL(DS,) 5470 TI. 58 19 


Table 1(b). Description of Leukemia(DS2) 
Dataset name Total Gene Total Sample ALL AML 
Leukemia(DS2) 5328 12 47 25 


SCDT: FC-NNC-structured Complex Decision Technique for Gene ... (Sudha V.) 


4508 O ISSN: 2088-8708 


Table1 (c). Description of Prostate Tumor (DS3) 
Dataset name Total Gene Total Sample ALL AML 
Prostate Tumor (DS;3) 10510 102 52 50 


Table 2. Snapshot of each dataset DS;,DS2 and DS3 


1 2 3 4 1 2 3 4 
1 59 17480 3 384 1 88 15091 7 311 
2 267 12086 52 -325 2 283 11038 37 134 
3 66 8611 -7 491 3 309 16692 183 378 
4 -37 24197 25 -694 4 12 15763 45 268 
5 109 15109 38 -108 5 168 18128 -28 118 
6 71 9059 -23 -220 6 71 34207 65 154 
7 31 29480 31 -5868 7 55 30801 43 80 
8 148 8305 -21 -96 8 -2 25147 338 269 
9 84 10321 2 -4933 9 268 15272 29 188 
10 53 10599 -11 -266 10 219 21801 -36 -39 
11 72 15842 -32 -5193 11 82 18167 -8 115 
DLBCL(DS:) Leukemia(DS2) 
1 2 3 4 
1 6.1000 -0.1000 11.9000 14.4000 
2 1 0 2 4 
3 22 2 51 52 
4 14 6 15 21 
5 13 4 39 25 
6 20 1 23 29 
7 16 8 47 33 
8 13 0 29 15 
9 32 8 96 37 
10 18 14 58 32 


Prostate Tumour(DS;) 


2.1. Gene Selection Method: Conventional and Proposed SCDT 

Three conventional methods for the gene sections includes 1) Two sample T test(2STT), 2) Entropy 
test(ET) and 3) Signal-to-Noise Ratio(SNR) is evaluated with random sample size section(S,) for gene 
ranking and visualizing top-k gene, where k=3. Along with proposed Structured Complex Decision 
Technique (SCDT). The section 3.1.1 describes 2STT. 


2.1.1. Two sample T-test (2STT) 

In this process, two independent samples of data is taken and in order to know the whether the 
average difference among these two samples are significant or not, the 2-Sample-T-test (2STT) is done. In 
the context of gene selection, the 2STT is performed on each gene and the expression levels are separated on 
the basis of class variable. If the value of ‘abs(t)’ is found more that indicates that the gene is more 
important.nIf the two-data size of n; and nz with their sample mean as u, and u, as well o, and o, be their 
sample standard deviation, then the value of t is computed by Equation 1. 


E (14 E My) 
(1) 


2.1.2 Entropy Test (ET) 

The cases where the assumption is that classes are normally distributed relative entropy(RE) or 
Kullback-Liebler distance or divergence test is conducted using Equation 2. The gene having highest value of 
entropy is selected for the input of classification module. 


II o o 1 1 2 
= z+— 2 |+| — +t —w, 2 
E or ) G = ra | K 
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2.1.3. Signal to Noise Ratio (SNR) 
SNR defines the relative class separation metric by means of signal quality and noise 


2.2. Gene Raking Algorithms 

The proposed SCDT gene selection algorithm takes input from the three-different algorithm namely 
2STT, ET and SNR for greatest ranking selection of gene for the classification purpose. On the execution of 
above algorithm, the snapshot of each individual method is taken and shown in the table 3 


SCDT: Gene Ranking Algorithm: GR-Algorithms 
Create Empty vector for 2STT, ET, SNR 

for each Gi 

[v1, v2]<-f(All, FL) 

[mulm mu2]—fimean(v1l, v2) 

[sd1, sd2]<— fita(v1,v2) 

(nl, n2]—fien(v1,v2) 

2STT «formula 

ET< formula 

SRN+« formula 

Process for Proposed SCDT 

Normalization of 2STT, ET, SRN 

[2STT, ET, SRN]< [2STT /fmax(2STT), ET /fmax(ET), SRN /fmax(SRN)] 
SCDT<fiyo(2STT, ET, SRN) 


Table 3 Snapshot of Selected Gene by 2STT, ET, SNR and SCDT 


ł 


Top Gene selected from Two Sample Top Gene selected from Top Gene selected from Signal 
T test(2STT), Entropy Test(ET), to Noise ratio est(SNRT), 


* 
» ™, 


#% 
*% * 
D ee EZ ad pen 
2000 annn ii nE RE 


Top Gene selected from proposed SCDT 


2.3. Classification based on the Selected Gene 
The system is evaluated for five different classifications in which four namely 1) Radial Basis 
function, 2) MLP, 3) Feed forward, 4) SVM and finally FC-NNC is used 
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2.3.1. Radial Basis Function (RBF) 
The radial basis network (RBN) approximates the function by adding additional layer to the hidden 
layer of RBN unless it reaches or achieves specified (or targeted) mean square objectives. 


Jien(Input vector, Target class value) : ~RBN 


2.3.2. Multilayer Perception (MLP) 

It is basically a type of neural network that uses feed forward-based learning mechanism with 
presence of three distinct layers of nodes. It is also known for its adoption of supervised learning approach 
that is termed as Back propagation algorithm. MLP is known for its capability to understand the distinction of 
linear and non-linear data. MLP is also known for its utilization of sigmoid function that is empirically 
represented by 


y(vi)=tanh(vi) and y(vi)=U+e™!y! 


The preliminary component is basically a hyperbolic tangent with a range of [-1 1] while the second 
component is basically representation of logistic function with a range of [0 1]. 


2.3.3. Feed Forward 

A feed forward learning process is one of the frequently used training algorithms which govern the 
orientation of the information restricted to a single direction. The operation of feed forward approach is 
carried out both in single and multiple-layer perceptions where both of them are associated with pros and 
cons. The pros factor of single and multiple-layered perception is its simplicity and capability to solve 
complex problems respectively. On the other side, the cons factor of single and multiple-layered perception is 
its consumption of higher computational time and includes increasing iterations respectively. 


2.3.4. Support Vector Machine (SVM) 

It is also a kind of machine learning concept that uses supervised learning approaches with a target 
of applying them for performing regression or performing classification operation. SVM is capable of 
performing both linear and non-linear classification quite effectively irrespective of its input type of higher 
degree of dimensionality. In order to apply this algorithm, it is required for labeling all the data. 
Implementation of the regression, identification of outliers, and classification is carried out using hyper plane 
in support vector machine. This scheme is also capable of controlling the computational load that allows 
simpler processing of dot product using a variable using kerne function k(x, y). 


2.3.5 Fuzzy Clustering Neighborhood Cluster (FC-NNC) 

The prime intention of this is to embedded the better degree of freedom in both the inference 
(Mamdani and Takagi Sugeno) models in Fuzzy logic in order to ensure enhance capability to address 
uncertainties. This version of fuzzy logic has more capability as compared to existing one as it offers more 
practicality in the inference process. In conventional fuzzy logic based implementation, the crisp inputs are 
given to fuzzifier which is further forwarded as fuzzy sets to the inference block that is controlled by a set of 
fuzzy rules. The fuzzy outcomes are then forwarded to the defuzzifier in order to obtain crisp outputs. The 
proposed FC-NNC performs the similar step till inference block but after that it is significantly amended. The 
fuzzy inputs in FC-NNC are subjected to a special form of output processing. In this case, a type reducer 
obtains the input of fuzzy output sets, processes it and then forwards it to defuzzifier block. There are two 
outputs obtained in FC-NNC process i.e. one of crisp output and another is type-reduced set. 


3. MICROARRAY DATA SET 

Basically, microarray can be said to be a collection of different number of spots of DNA, where 
these information is utilized for computing the degree of expression associated with gene. Usually, the 
process of representation of gene expression data is carried out using expression matrix, where the 
information retaining columns represent single experimental data while all the rows exhibits complete 
collection of experimental data. Basically, it is an archive of various forms of data in microarray that consists 
of essentially the information of gene expression. The prime purpose of this database is to perform an 
effective management of the data wth an aid of the data index assisting in generating a query. Different forms 
of micro-array data are ArrayTrack, ImmGen database, ArrayExpress, GeneNetwork, MUSC, UPSC-BASE, 
Stanford Microarray database. All these databases offer voluminous information of gene expression that is 
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used for public utilization. Consisting of more than 60, 000 samples of data there are more than millions of 
profiles in gene expression. 


4. RESULTS AND ANALYSIS 

This section discusses about the results being obtained from the proposed system. The complete 
analysis of the outcome is carried out with an aid of Fl-Score; Leave one out cross validation metric, 
Sensitivity, Specificity, and Precision. The section also elaborates about these methods individually and 
illustrates the proposed analysis of results 


4.1. Analysis of F1-Score 

In binary classification, the Fl-score generally measures the accuracy of the test which is computed 
on the basis of precession and recall values. The value of Fl-score is computed by Equation 3, where P= 
precision, S= Sensitivity. The best value of Fl-score is considered as 1 and the worst one as 0. 


(PXS) 
(P+S) 


Fl-Score= 2 x [ (3) 


The outcome shown in Figure 2 highlights that Fl-Score of proposed SCDT is much better than 
existing approaches with respect to SNR, entropy, and t-test. Whereas, proposed FC-NCC offer better 
performance in comparison to existing machine learning techniques i.e. RBF, MLP, Feed-forward, and 
support vector machine, etc. A closer look into the performance will only show that proposed SCDT offers 
better Fl-score in comparison to FC-NCC. The value of Fl-Score for Proposed SCDT is found at 1 for least 
number of gene from the gene range of 5 -30. Even at the incremental values of the gene, the Fl-score 
reduces but again becomes consistent at 30 gene at highest level of score 1. That shows that the proposes 
SCDT achieves best value of Fl-score. At the same time at the lower gene the 2STT, SNR and then Entropy 
exhibit the better performance in reducing order. Whereas on the higher selection of gene SNR gets better 
results as compared to the 2STT and entropy (both exhibit same value). Figure 3 shows performance of F1- 
score vs changing value of numbers of gene. 


5 10 15 20 25 30 
No of Genes 


Figure 2. Performance of Fl-score vs changing value of numbers of gene (a) 2STT, (b) Entropy test, 
(c) SNRT, (d) Proposed SCDT 
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—#— RBF 
——ae— MLP 

0.75 ir FeedF orward 
—e— SVM 
oe FC, NC 


0.7 x 1 1 1 
5 10 15 20 25 30 


No of Genes 


Figure 3 Performance of Fl-score vs changing value of numbers of gene (a) RBF, (b) MLP, (c) Feed 
forward, (4) SVM, (5) Proposed FC-NCC 


4.2. Analysis of Leave One Out Cross Validation Metric (LOOCV) 

Usually, in gene expression dataset of microarray the number of samples are very small, therefore to 
provide exhaustive training leave one out cross validation method(LOOCV) is used. In LOOCV the entire 
dataset is divided into ‘K’ random and distinct subset. The K-1 is used for training and kth sample is used for 
the testing purpose. The accuracy of LOOCV is computed by Equation 4, where A is the count of correctly 
classified samples. 


LOOCV = f (4) 

A comparison in the trends of SCDT and FC-NCC shows that SCDT offers increasing value of 
LOOCYV in comparison to FC-NCC over increasing number of genes. This is another clear indicating that 
proposed SCDT could offer better degree of information while attempting to perform classification of the 
disease or any other form of abnormality in microarray data. Till now, the trend of SCDT is found to offer 
similar form of consistency for both Fl-score and LOOCV. Performance of LOOCV vs changing value of 
Numbers of Gene as shown in Figure 4. Figure 5 shows performance of LOOCV vs changing value of 
numbers of gene 


LOOCV 


5 10 15 20 25 30 
No of Genes 


Figure 4. Performance of LOOCV vs changing value of numbers of gene (a) 2STT, (b) Entropy Test, (c) 
SNRT, (4) Proposed SCDT 
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Figure 5. Performance of LOOCV vs changing value of numbers of gene (a) RBF, (b) MLP, (c) Feed 
forward, (4) SVM, (5) Proposed FC-NCC 


4.3. Analysis of Precession 
Precision is one of the elementary parameter used in pattern recognition as well as in classification 
problems. The computation of the precision is carried out as follows Equation 5. 


_RI-E!I 
EI 


P (5) 


The above expression shows that precision P is calculated by dividing the difference of relevant 
information RI and extracted information EI with extracted information EI. This expression is always 
interpreted with respect to probability. The outcomes obtained are as shown in Figure 6 and Figure 7. 


Precision 


5 10 15 20 25 30 
No of Genes 


Figure 6. Performance of precision vs changing value of numbers of gene (a) 2STT, (b) Entropy test, (c) 
SNRT, (4) Proposed SCDT 
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—#— RBF 

-0 MLP 
wie FeedF orward 
—e— SVM 
—s— FC NC 


Precision 


5 10 15 20 25 30 
No of Genes 


Figure 7. Performance of precision vs changing value of numbers of gene (a) RBF, (b) MLP, (c) 
FeedForward, (4) SVM, (5) Proposed FC-NCC 


A closer look into the pattern of the curve shown in Figure 6 and Figure 7 shows that both SCDT 
and FC-NCC offers similar linear pattern of precision with increasing number of genes. This outcome shows 
that irrespective of any number of genes, the proposed system using any form of fuzzy logic (not the 
conventional singleton one) will always yield similar consistency in its outcome, which is quite predictable in 
itself. The predictability in precision performance offers value added performance when attempting to 
perform classification of any form of clinical abnormalities in microarray data. 


4.4. Analysis of Sensitivity 

Sensitivity is another frequently used performance parameter for assessing classification 
performance. It is used for calculating amount of positive outcome considered to be accurately identified. 
The calculation of sensitivity is carried out in following manner: 


Sen = (6) 


In Equation 6, Sensitivity is computed by considering X which is true positive identification of some 
clinical abnormality and Y which is false negative identification. The graphical outcome of sensitivity is as 
follows Figure 8 and Figure 9. 


Sensitivity 


5 10 15 20 25 30 
No of Genes 


Figure 8. Performance of Sensitivity vs changing value of Numbers of Gene (a) 2STT, (b) Entropy Test, (c) 
SNRT, (4) Proposed SCDT 
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Figure 9. Performance of Sensitivity vs changing value of Numbers of Gene a.) RBF, b) MLP, c) Feed 
Forward, 4) SVM and 5) Proposed FC-NCC 


The study outcome shows higher value of sensitivity for both the proposed system of SCDT and FC- 
NCC with increase of number of genes. This proves that success rate of identification process of the proposed 
system is good in comparison to any form of existing approaches. 


4.5. Specificity 

The proposed system uses specificity as the final performance parameter in order to assess the 
classification performance. This performance factor is used for accurately rejecting the false positive cases 
that could offer anomaly in the outcome. The calculation of specificity is carried out in following manner: 


A 
Spe = 
pe = +B m 


In the above expression, the variable A represent number of true negatives while B represents 
number of false negative. The graphical outcome of specificity is as follows in Figure 10 and Figure 11. 


5 10 15 20 25 30 
No of Genes 


Figure 10. Performance of specificity vs changing value of numbers of gene a.) 2STT, b) Entropy Test, c) 
SNRT and 4) Proposed SCDT 
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No of Genes 


Figure 11. Performance of sensitivity vs changing value of numbers of gene (a) RBF, (b) MLP, (c) Feed 
forward, (4) SVM, (5) Proposed FC-NCC 


Higher value of specificity is a direct interpretation that proposed system doesn’t have many 
chances to offer false negatives and hence the better form of accuracy within the proposed system is always 
retained. In this case, FC-NCC offers better performance in contrast to SCDT with respect to productiveness 
in is linearity in its outcome in comparison to other systems also. 


5. CONCLUSION 

This paper discusses about a simplistic modeling of solving classification problem considering the 
case study of microarray data. The approach is essentially meant for classifying cancer-based gene 
expression using modified version of fuzzy logic. The contribution of the proposed system is that it 
overcomes the dependencies of larger rule set unlike conventional fuzzy logic for assisting in better 
performance of clustering. Different performance parameters associated with accuracy has been considered 
for assessing the classification performance of the proposed system where the outcome shows that proposed 
system offers better classification accuracy in contrast to other existing system. 
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